Auditory and photo-realistic audiovisual speech synthesis for Dutch
نویسندگان
چکیده
Both auditory and audiovisual speech synthesis have been the subject of many research projects throughout the years. Unfortunately, in recent years only very few research focuses on synthesis for the Dutch language. Especially for audiovisual synthesis, hardly any available system or resource can be found. In this paper we describe the creation of a new extensive Dutch speech database, containing audiovisual recordings of a single speaker. The database is constructed as such that it can be employed in both auditory and audiovisual speech synthesis systems. Subsequently, we describe how we achieve high-quality auditory speech synthesis by applying the database in our textto-speech framework. In addition, it is explained how we used the new database to attain photorealistic audiovisual text-tospeech synthesis for Dutch. The new database and its applications for synthesis are a significant addition to the resources for Dutch speech synthesis research.
منابع مشابه
2D Audiovisual Text-to-Speech Synthesis for Human-Machine Interaction in Dutch
Speech has always been the most important means of communication between humans. Therefore, using speech in machine-human communication can help in increasing the naturalness of the communication between a computer system and a user. Systems that can make a machine pronounce any given input text are referred to as text-to-speech systems. To further enhance the communication, a talking head can ...
متن کاملBenefits of facial and textual information in understanding of vocoded speech
Exposure to audiovisually presented vocoded speech is more effective than exposure to auditory-only vocoded speech in improving the subsequent ability to understand vocoded speech [1]. In addition, improvements in the audiovisual training condition were more rapid and greater in magnitude than in the auditory-only condition. The current study was conducted to establish whether exposure to concu...
متن کاملAudiovisual perception of congruent and incongruent Dutch front vowels.
PURPOSE Auditory perception of vowels in background noise is enhanced when combined with visually perceived speech features. The objective of this study was to investigate whether the influence of visual cues on vowel perception extends to incongruent vowels, in a manner similar to the McGurk effect observed with consonants. METHOD Identification of Dutch front vowels /i, y, e, Y/ that share ...
متن کاملRule-based visual speech synthesis
A system for rule based audiovisual text-to-speech synthesis has been created. The system is based on the KTH text-to-speech system which has been complemented with a three-dimensional parameterized model of a human face. The face can be animated in real time, synchronized with the auditory speech. The facial model is controlled by the same synthesis software as the auditory speech synthesizer....
متن کاملSpeech-specificity of two audiovisual integration effects
Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers often fail to perceive as speech. While audiovisual integration in the identification task only occu...
متن کامل